Machine Learning Analysis Report

Generated on August 03, 2025 at 09:04 PM

Machine Learning Analysis Pipeline

EDR: Dataset Loading & Preprocessing

EDR – Train/Test Overview
• Train shape: (17536, 20) | Test shape: (1535, 20)
• Total train samples: 17,536 | Total test samples: 1,535
• Number of features: 18
• Target column: 'label'
• Missing values (train): 0 | (test): 0
EDR – Train Class Distribution
• 0: 16,679
• 1: 857
• Class balance (minority/majority): 5.1382%
EDR – Feature Preparation
• Target encoding: {0: 0, 1: 1}
• Data preprocessing: Infinite values handled, missing values filled with train medians
• Feature scaling: StandardScaler (fit on train, applied to test)
Baseline (Most-Frequent) Accuracy: 0.9518

EDR: Model Performance Comparison

EDR – Model Performance Metrics

ModelAccuracyBalanced AccPrecisionRecallF1ROC-AUCPR-AUC
Logistic Regression0.93030.58490.23810.20270.21900.64000.1425
Random Forest (SMOTE)0.94140.57150.30000.16220.21050.72800.1860
LightGBM0.94400.55360.30000.12160.17310.84890.2193
Balanced RF0.89250.66770.20260.41890.27310.82900.1893
SGD SVM0.08600.51980.05011.00000.0954nannan
IsolationForest0.83970.55020.08250.22970.1214nannan

Confusion Matrix Analysis

ModelTNFPFNTPFP RateMiss Rate
Logistic Regression14134859153.29%79.73%
Random Forest (SMOTE)14332862121.92%83.78%
LightGBM1440216591.44%87.84%
Balanced RF133912243318.35%58.11%
SGD SVM58140307496.03%0.00%
IsolationForest1272189571712.94%77.03%

Best Models by Metric

Accuracy
LightGBM
0.9440
Balanced Acc
Balanced RF
0.6677
Precision
Random Forest (SMOTE)
0.3000
Recall
SGD SVM
1.0000
F1
Balanced RF
0.2731
ROC-AUC
LightGBM
0.8489
PR-AUC
LightGBM
0.2193
Lowest False Positive Rate
LightGBM
1.44%
Lowest Miss Rate
SGD SVM
0.00%

EDR – Metrics by Model

EDR – Metrics by Model

EDR – ROC Curves

EDR – ROC Curves

EDR – Precision–Recall Curves

EDR – Precision–Recall Curves

EDR – Predicted Probability Distributions

EDR – Predicted Probability Distributions

EDR – Threshold Sweep

EDR – Threshold Sweep

EDR: Logistic Regression – Detailed Analysis

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Classification Report

Modelprecisionrecallf1support
00.95990.96710.96351461.0000
10.23810.20270.219074.0000
accuracynannan0.93031535.0000

EDR – Logistic Regression: Feature Importance

EDR – Logistic Regression: Feature Importance

EDR – Logistic Regression: Feature Importance

EDR: Random Forest (SMOTE) – Detailed Analysis

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Classification Report

Modelprecisionrecallf1support
00.95850.98080.96961461.0000
10.30000.16220.210574.0000
accuracynannan0.94141535.0000

EDR – Random Forest (SMOTE): Feature Importance

EDR – Random Forest (SMOTE): Feature Importance

EDR – Random Forest (SMOTE): Feature Importance

EDR: LightGBM – Detailed Analysis

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Classification Report

Modelprecisionrecallf1support
00.95680.98560.97101461.0000
10.30000.12160.173174.0000
accuracynannan0.94401535.0000

EDR – LightGBM: Feature Importance

EDR – LightGBM: Feature Importance

EDR – LightGBM: Feature Importance

EDR: Balanced RF – Detailed Analysis

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Classification Report

Modelprecisionrecallf1support
00.96890.91650.94201461.0000
10.20260.41890.273174.0000
accuracynannan0.89251535.0000

EDR – Balanced RF: Feature Importance

EDR – Balanced RF: Feature Importance

EDR – Balanced RF: Feature Importance

EDR: SGD SVM – Detailed Analysis

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Classification Report

Modelprecisionrecallf1support
01.00000.03970.07641461.0000
10.05011.00000.095474.0000
accuracynannan0.08601535.0000

EDR – SGD SVM: Feature Importance

EDR – SGD SVM: Feature Importance

EDR – SGD SVM: Feature Importance

EDR: IsolationForest – Detailed Analysis

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Classification Report

Modelprecisionrecallf1support
00.95710.87060.91181461.0000
10.08250.22970.121474.0000
accuracynannan0.83971535.0000

EDR – IsolationForest: Feature Importance

Feature importance not available for this model type.

XDR: Dataset Loading & Preprocessing

XDR – Train/Test Overview
• Train shape: (17536, 34) | Test shape: (1535, 34)
• Total train samples: 17,536 | Total test samples: 1,535
• Number of features: 32
• Target column: 'label'
• Missing values (train): 0 | (test): 0
XDR – Train Class Distribution
• 0: 16,679
• 1: 857
• Class balance (minority/majority): 5.1382%
XDR – Feature Preparation
• Target encoding: {0: 0, 1: 1}
• Data preprocessing: Infinite values handled, missing values filled with train medians
• Feature scaling: StandardScaler (fit on train, applied to test)
Baseline (Most-Frequent) Accuracy: 0.9518

XDR: Model Performance Comparison

XDR – Model Performance Metrics

ModelAccuracyBalanced AccPrecisionRecallF1ROC-AUCPR-AUC
Logistic Regression0.74200.62710.09340.50000.15740.62040.1363
Random Forest (SMOTE)0.94140.55870.27780.13510.18180.69340.1801
LightGBM0.94920.56280.41670.13510.20410.83700.2122
Balanced RF0.90360.69280.23940.45950.31480.83680.1678
SGD SVM0.11730.52990.05120.98650.0973nannan
IsolationForest0.91990.54740.14490.13510.1399nannan

Confusion Matrix Analysis

ModelTNFPFNTPFP RateMiss Rate
Logistic Regression1102359373724.57%50.00%
Random Forest (SMOTE)14352664101.78%86.49%
LightGBM14471464100.96%86.49%
Balanced RF135310840347.39%54.05%
SGD SVM107135417392.68%1.35%
IsolationForest14025964104.04%86.49%

Best Models by Metric

Accuracy
LightGBM
0.9492
Balanced Acc
Balanced RF
0.6928
Precision
LightGBM
0.4167
Recall
SGD SVM
0.9865
F1
Balanced RF
0.3148
ROC-AUC
LightGBM
0.8370
PR-AUC
LightGBM
0.2122
Lowest False Positive Rate
LightGBM
0.96%
Lowest Miss Rate
SGD SVM
1.35%

XDR – Metrics by Model

XDR – Metrics by Model

XDR – ROC Curves

XDR – ROC Curves

XDR – Precision–Recall Curves

XDR – Precision–Recall Curves

XDR – Predicted Probability Distributions

XDR – Predicted Probability Distributions

XDR – Threshold Sweep

XDR – Threshold Sweep

XDR: Logistic Regression – Detailed Analysis

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Classification Report

Modelprecisionrecallf1support
00.96750.75430.84771461.0000
10.09340.50000.157474.0000
accuracynannan0.74201535.0000

XDR – Logistic Regression: Feature Importance

XDR – Logistic Regression: Feature Importance

XDR – Logistic Regression: Feature Importance

XDR: Random Forest (SMOTE) – Detailed Analysis

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Classification Report

Modelprecisionrecallf1support
00.95730.98220.96961461.0000
10.27780.13510.181874.0000
accuracynannan0.94141535.0000

XDR – Random Forest (SMOTE): Feature Importance

XDR – Random Forest (SMOTE): Feature Importance

XDR – Random Forest (SMOTE): Feature Importance

XDR: LightGBM – Detailed Analysis

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Classification Report

Modelprecisionrecallf1support
00.95760.99040.97381461.0000
10.41670.13510.204174.0000
accuracynannan0.94921535.0000

XDR – LightGBM: Feature Importance

XDR – LightGBM: Feature Importance

XDR – LightGBM: Feature Importance

XDR: Balanced RF – Detailed Analysis

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Classification Report

Modelprecisionrecallf1support
00.97130.92610.94811461.0000
10.23940.45950.314874.0000
accuracynannan0.90361535.0000

XDR – Balanced RF: Feature Importance

XDR – Balanced RF: Feature Importance

XDR – Balanced RF: Feature Importance

XDR: SGD SVM – Detailed Analysis

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Classification Report

Modelprecisionrecallf1support
00.99070.07320.13641461.0000
10.05120.98650.097374.0000
accuracynannan0.11731535.0000

XDR – SGD SVM: Feature Importance

XDR – SGD SVM: Feature Importance

XDR – SGD SVM: Feature Importance

XDR: IsolationForest – Detailed Analysis

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Classification Report

Modelprecisionrecallf1support
00.95630.95960.95801461.0000
10.14490.13510.139974.0000
accuracynannan0.91991535.0000

XDR – IsolationForest: Feature Importance

Feature importance not available for this model type.